Recognising interest in conversational speech - comparing bag of frames and supra-segmental features

نویسندگان

  • Björn W. Schuller
  • Gerhard Rigoll
چکیده

It is common knowledge that affective and emotion-related states are acoustically well modelled on a supra-segmental level. Nonetheless successes are reported for frame-level processing either by means of dynamic classification or multi-instance learning techniques. In this work a quantitative feature-type-wise comparison between frame-level and supra-segmental analysis is carried out for the recognition of interest in human conversational speech. To shed light on the respective differences the same classifier, namely Support-Vector-Machines, is used in both cases: once by clustering a ‘bag of frames’ of unknown sequence length employing Multi-Instance Learning techniques, and once by statistical functional application for the projection of the time series onto a static feature vector. As database serves the Audiovisual Interest Corpus of naturalistic interest.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Using PRAAT Software on Pre-Intermediate EFL Learners’ Supra Segmental Features

The present study investigated the effect of using PRAAT as a free computer software package for the scientific analysis of speech in phonetics on pre-intermediate Iranian English as foreign language (EFL) learners’ supra segmental features (i.e., intonation and stress). The design of the study was a Quasi-experimental research design with a pre and post-test. In doing so...

متن کامل

Speech Recognition Only with Supra - segmental Features — Hearing Speech as Music —

This paper proposes a novel paradigm of speech recognition where only the supra-segmental features are utilized. Absolute properties of speech events such as formants and spectrums are completely discarded and only the relative and differential properties of the events are extracted as phonic contrasts. The phonic contrasts are considered as supra-segmental features and they are mathematically ...

متن کامل

GMM Classification of TTS Synthesis: Identification of Original Speaker's Voice

This paper describes two experiments. The first one deals with evaluation of synthetic speech quality by reverse identification of original speakers whose voices had been used for several Czech text-to-speech (TTS) systems. The second experiment was aimed at evaluation of the influence of voice transformation on the original speaker recognition. The paper further describes an analysis of the in...

متن کامل

On compressibility of neural network phonological features for low bit rate speech coding

Phonological features extracted by neural network have shown interesting potential for low bit rate speech vocoding. The span of phonological features is wider than the span of phonetic features, and thus fewer frames need to be transmitted. Moreover, the binary nature of phonological features enables a higher compression ratio at minor quality cost. In this paper, we study the compressibility ...

متن کامل

Supra-segmental Feature Based Speaker Trait Detection

It is well known that speech utterances convey a rich diversity of information concerning the speaker in addition to related semantic content. Such information may contain speaker traits such as personality, likability, health/pathology, etc. To detect speaker traits in human computer interface is an important task toward formulating more efficient and natural computer engagement. This study pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009